3 results
7 - Native Language Identification on EFCAMDAT
- from Part III - Data Driven Models
-
- By Xiao Jiang, Computer Laboratory, University of Cambridge, UK, Yan Huang, Department of Theoretical and Applied Linguistics, University of Cambridge, UK, Yufan Guo, IBM Research, USA, Jeroen Geertzen, Department of Theoretical and Applied Linguistics, University of Cambridge, UK, Theodora Alexopoulou, Department of Theoretical and Applied Linguistics, University of Cambridge, UK, Lin Sun, Greedy Intelligence, China, Anna Korhonen, Department of Theoretical and Applied Linguistics, University of Cambridge, UK
- Edited by Thierry Poibeau, Centre National de la Recherche Scientifique (CNRS), Paris, Aline Villavicencio, Universidade Federal do Rio Grande do Sul, Brazil
-
- Book:
- Language, Cognition, and Computational Models
- Published online:
- 30 November 2017
- Print publication:
- 25 January 2018, pp 159-184
-
- Chapter
- Export citation
-
Summary
Abstract
Native Language Identification (NLI) is a task aimed at determining the native language (L1) of learners of second language (L2) on the basis of their written texts. To date, research on NLI has focused on relatively small corpora. We apply NLI to EFCAMDAT, an L2 English learner corpus that is not only multiple times larger than previous L2 corpora but also provides pseudo-longitudinal data across several proficiency levels. Based on accurate machine learning with a wide range of linguistic features, our investigation reveals interesting patterns in the longitudinal data that are useful for both further development of NLI and its application to research on L2 acquisition.
Introduction
Native language identification (NLI) is a task aimed at detecting the native language (L1) of writers on the basis of their second language (L2) production. NLI is important for natural language processing (NLP) applications including language tutoring systems and authorship profiling. Moreover, NLI can offer useful empirical data for research on L2 acquisition. For example, NLI can shed light on how L1 background influences L2 learning, and on differences between the writings of L2 learners across different L1 backgrounds.
To date, studies on NLI have focused on relatively small learner corpora. Furthermore, none of them have investigated the influence of L1s across L2 proficiency levels. Our work takes the first step toward addressing these problems. We apply NLI to EFCAMDAT, the EF-Cambridge Open Language Database (Geertzen, Alexopoulou, and Korhonen, 2013), an open-access L2 learner corpus.
EFCAMDAT consists of writings of learners submitted to Englishtown, the online school of EF. EFCAMDAT stands out for its size, diversity of student backgrounds, and coverage of the proficiency levels. The first release of 2013 (Geertzen, Alexopoulou, and Korhonen, 2013), on which this paper is based, amounts to 30 million words, a corpus multiple times larger than any other available L2 corpora. Using a standard machine learning–based methodology for NLI, we explore the optimal linguistic features for NLI on this data at different proficiency levels. We discover interesting patterns that can be useful for both further development of NLI and its application to research on L2 acquisition.
In this introductory section, we first review the history of research on NLI, and introduce the data sets that have been used in earlier NLI research.We then summarise our contribution briefly.
Use of vitamin D supplements during infancy in an international feeding trial
- Eveliina Lehtonen, Anne Ormisson, Anita Nucci, David Cuthbertson, Susa Sorkio, Mila Hyytinen, Kirsi Alahuhta, Carol Berseth, Marja Salonen, Shayne Taback, Margaret Franciscus, Teba González-Frutos, Tuuli E Korhonen, Margaret L Lawson, Dorothy J Becker, Jeffrey P Krischer, Mikael Knip, Suvi M Virtanen, , Thomas Mandrup-Poulsen, Elias Arjas, Åke Lernmark, Barbara Schmidt, Jeffrey P. Krischer, Hans K. Åkerblom, Mila Hyytinen, Mikael Knip, Katriina Koski, Matti Koski, Eeva Pajakkala, Marja Salonen, David Cuthbertson, Jeffrey P. Krischer, Linda Shanker, Brenda Bradley, Hans-Michael Dosch, John Dupré, William Fraser, Margaret Lawson, Jeffrey L. Mahon, Mathew Sermer, Shayne P. Taback, Dorothy Becker, Margaret Franciscus, Anita Nucci, Jerry Palmer, Minna Pekkala, Suvi M. Virtanen, Jacki Catteau, Neville Howard, Patricia Crock, Maria Craig, Cheril L. Clarson, Lynda Bere, David Thompson, Daniel Metzger, Colleen Marshall, Jennifer Kwan, David K. Stephure, Daniele Pacaud, Wendy Schwarz, Rose Girgis, Marilyn Thompson, Shayne P. Taback, Daniel Catte, Margaret L. Lawson, Brenda Bradley, Denis Daneman, Mathew Sermer, Mary-Jean Martin, Valérie Morin, Lyne Frenette, Suzanne Ferland, Susan Sanderson, Kathy Heath, Céline Huot, Monique Gonthier, Maryse Thibeault, Laurent Legault, Diane Laforte, Elizabeth A. Cummings, Karen Scott, Tracey Bridger, Cheryl Crummell, Robyn Houlden, Adriana Breen, George Carson, Sheila Kelly, Koravangattu Sankaran, Marie Penner, Richard A. White, Nancy King, James Popkin, Laurie Robson, Eva Al Taji, Irena Aldhoon, Pavla Mendlova, Jan Vavrinec, Jan Vosahlo, Ludmila Brazdova, Jitrenka Venhacova, Petra Venhacova, Adam Cipra, Zdenka Tomsikova, Petra Krckova, Pavla Gogelova, Ülle Einberg, Mall-Anne Riikjärv, Anne Ormisson, Vallo Tillmann, Päivi Kleemola, Anna Parkkola, Heli Suomalainen, Anna-Liisa Järvenpää, Anu-Maaria Hämälainen, Hannu Haavisto, Sirpa Tenhola, Pentti Lautala, Pia Salonen, Susanna Aspholm, Heli Siljander, Carita Holm, Samuli Ylitalo, Raisa Lounamaa, Anja Nuuja, Timo Talvitie, Kaija Lindström, Hanna Huopio, Jouni Pesola, Riitta Veijola, Päivi Tapanainen, Abram Alar, Paavo Korpela, Marja-Liisa Käär, Taina Mustila, Ritva Virransalo, Päivi Nykänen, Bärbel Aschemeier, Thomas Danne, Olga Kordonouri, Dóra Krikovszky, László Madácsy, Yeganeh Manon Khazrai, Ernesto Maddaloni, Paolo Pozzilli, Carla Mannu, Marco Songini, Carine de Beaufort, Ulrike Schierloh, Jan Bruining, Margriet Bisschoff, Aleksander Basiak, Renata Wasikowa, Marta Ciechanowska, Grazyna Deja, Przemyslawa Jarosz-Chobot, Agnieszka Szadkowska, Katarzyna Cypryk, Malgorzata Zawodniak-Szalapska, Luis Castano, Teba Gonzalez Frutos, Mirentxu Oyarzabal, Manuel Serrano-Ríos, María Teresa Martínez-Larrad, Federico Gustavo Hawkins, Dolores Rodriguez Arnau, Johnny Ludvigsson, Malgorzata Smolinska Konefal, Ragnar Hanas, Bengt Lindblad, Nils-Osten Nilsson, Hans Fors, Maria Nordwall, Agne Lindh, Hans Edenwall, Jan Aman, Calle Johansson, Margrit Gadient, Eugen Schoenle, Dorothy Becker, Ashi Daftary, Margaret Franciscus, Carol Gilmour, Jerry Palmer, Rachel Taculad, Marilyn Tanner-Blasiar, Neil White, Uday Devaskar, Heather Horowitz, Lisa Rogers, Roxana Colon, Teresa Frazer, Jose Torres, Robin Goland, Ellen Greenberg, Maudene Nelson, Holly Schachner, Barney Softness, Jorma Ilonen, Massimo Trucco, Lynn Nichol, Erkki Savilahti, Taina Härkönen, Mikael Knip, Outi Vaarala, Kristiina Luopajärvi, Hans-Michael Dosch
-
- Journal:
- Public Health Nutrition / Volume 17 / Issue 4 / April 2014
- Published online by Cambridge University Press:
- 24 June 2013, pp. 810-822
-
- Article
-
- You have access Access
- HTML
- Export citation
-
Objective
To examine the use of vitamin D supplements during infancy among the participants in an international infant feeding trial.
DesignLongitudinal study.
SettingInformation about vitamin D supplementation was collected through a validated FFQ at the age of 2 weeks and monthly between the ages of 1 month and 6 months.
SubjectsInfants (n 2159) with a biological family member affected by type 1 diabetes and with increased human leucocyte antigen-conferred susceptibility to type 1 diabetes from twelve European countries, the USA, Canada and Australia.
ResultsDaily use of vitamin D supplements was common during the first 6 months of life in Northern and Central Europe (>80 % of the infants), with somewhat lower rates observed in Southern Europe (>60 %). In Canada, vitamin D supplementation was more common among exclusively breast-fed than other infants (e.g. 71 % v. 44 % at 6 months of age). Less than 2 % of infants in the USA and Australia received any vitamin D supplementation. Higher gestational age, older maternal age and longer maternal education were study-wide associated with greater use of vitamin D supplements.
ConclusionsMost of the infants received vitamin D supplements during the first 6 months of life in the European countries, whereas in Canada only half and in the USA and Australia very few were given supplementation.
Phyto-oestrogen database of foods and average intake in Finland
- Liisa M. Valsta, Annamari Kilkkinen, Witold Mazur, Tarja Nurmi, Anna-Maija Lampi, Marja-Leena Ovaskainen, Tommi Korhonen, Herman Adlercreutz, Pirjo Pietinen
-
- Journal:
- British Journal of Nutrition / Volume 89 / Issue S1 / June 2003
- Published online by Cambridge University Press:
- 26 October 2011, pp. S31-S38
- Print publication:
- June 2003
-
- Article
-
- You have access Access
- Export citation
-
Information on phyto-oestrogen intake in various populations has been scanty until now, primarily because data on the content of these compounds in foods were lacking. We report here on expansion of the Finnish National Food Composition Database (Fineli®) with values for the plant lignans matairesinol and secoisolariciresinol and the isoflavones daidzein and genistein. The values, expressed as aglycones, were based on food analyses (mainly GC–MS) or imputed from analytical data for 180 foods for lignans and 160 foods for isoflavones; additionally, over 1000 values were derived from the recipe database of Fineli. Average intake of these phyto-oestrogens was calculated using food consumption data of the National Dietary Survey FINDIET 1997, which was carried out in a random sample of the adult population in five areas in Finland. The dietary data were collected by 24 h recall (n=2862). The mean lignan intake was 434 (standard deviation (SD) 1575) μg/d and the mean isoflavone intake was 788 (SD 673) μg/d. Women had a higher lignan density (μg lignans/MJ) in their diet than men (P<0·05). Men had a higher mean daily isoflavone intake, 902 (SD 368) μg, than women, 668 (SD 963) μg (P<0·05). The sources of lignans were many: seeds, cereals, fruit, berries and vegetables. The main sources of isoflavones appeared to be processed meat products/sausages containing soya as an ingredient, and legumes as such. The average intake of lignans and isoflavones in Finland seems to be low, but intake varies throughout the population.